Overview

Dataset Statistics

Number of Variables 8
Number of Rows 3990
Missing Cells 828
Missing Cells (%) 2.6%
Duplicate Rows 0
Duplicate Rows (%) 0.0%
Total Size in Memory 16.3 MB
Average Row Size in Memory 4.2 KB
Variable Types
  • Categorical: 6
  • Numerical: 1
  • GeoGraphy: 1

Dataset Insights

company_salary has 752 (18.85%) missing values Missing
companyName has a high cardinality: 2806 distinct values High Cardinality
company_offeredRole has a high cardinality: 2256 distinct values High Cardinality
company_salary has a high cardinality: 1738 distinct values High Cardinality
listing_jobDesc has a high cardinality: 3862 distinct values High Cardinality
requested_url has a high cardinality: 3990 distinct values High Cardinality
requested_url has all distinct values Unique

Variables


companyName

categorical

Approximate Distinct Count 2806
Approximate Unique (%) 71.0%
Missing 38
Missing (%) 1.0%
Memory Size 333206
  • The largest value (GVT Government Technology Agency (GovTech)) is over 1.93 times larger than the second largest value (Michael Page)

Length

Mean 19.3031
Standard Deviation 10.2893
Median 18
Minimum 3
Maximum 65

Sample

1st row Clyde&Co
2nd row LATHAM & WATKINS L...
3rd row Science Centre Boa...
4th row Science Centre Boa...
5th row Science Centre Sin...

Letter

Count 67661
Lowercase Letter 47633
Space Separator 6805
Uppercase Letter 20028
Dash Punctuation 65
Decimal Number 110
  • companyName contains many words: 3266 words

company_starRating

numerical

Approximate Distinct Count 37
Approximate Unique (%) 0.9%
Missing 38
Missing (%) 1.0%
Infinite 0
Infinite (%) 0.0%
Memory Size 63232
Mean 3.7968
Minimum 1
Maximum 5
Zeros 0
Zeros (%) 0.0%
Negatives 0
Negatives (%) 0.0%
  • company_starRating is skewed left (γ1 = -0.5804)

Quantile Statistics

Minimum 1
5-th Percentile 2.8
Q1 3.5
Median 3.8
Q3 4.2
95-th Percentile 4.7
Maximum 5
Range 4
IQR 0.7

Descriptive Statistics

Mean 3.7968
Standard Deviation 0.5608
Variance 0.3145
Sum 15004.8
Skewness -0.5804
Kurtosis 1.4134
Coefficient of Variation 0.1477
  • company_starRating is not normally distributed (p-value 0.007059396226820826)
  • company_starRating has 51 outliers

company_offeredRole

categorical

Approximate Distinct Count 2256
Approximate Unique (%) 56.5%
Missing 0
Missing (%) 0.0%
Memory Size 373382

Length

Mean 27.6767
Standard Deviation 14.6731
Median 23
Minimum 7
Maximum 131

Sample

1st row CRM Insight Analys...
2nd row Conflicts Reportin...
3rd row Digital Experience...
4th row Digital Experience...
5th row Digital Experience...

Letter

Count 94960
Lowercase Letter 77520
Space Separator 11701
Uppercase Letter 17440
Dash Punctuation 666
Decimal Number 562
  • company_offeredRole contains many words: 1528 words

company_salary

categorical

Approximate Distinct Count 1738
Approximate Unique (%) 53.7%
Missing 752
Missing (%) 18.9%
Memory Size 268741

Length

Mean 17.996
Standard Deviation 6.9749
Median 14
Minimum 4
Maximum 35

Sample

1st row 5000 - 8000
2nd row 4000
3rd row 48000 - 72000
4th row 35000 - 78000
5th row 4000 - 5000

Letter

Count 14060
Lowercase Letter 12522
Space Separator 6130
Uppercase Letter 1538
Dash Punctuation 3065
Decimal Number 32361

listing_jobDesc

categorical

Approximate Distinct Count 3862
Approximate Unique (%) 96.8%
Missing 0
Missing (%) 0.0%
Memory Size 14040780

Length

Mean 2207.8774
Standard Deviation 1361.455
Median 2009
Minimum 12
Maximum 19739

Sample

1st row Work with speciali...
2nd row Explaining to atto...
3rd row Bachelor's or equi...
4th row What the role is u...
5th row Recommend AI and A...

Letter

Count 7323519
Lowercase Letter 7033913
Space Separator 1196372
Uppercase Letter 289606
Dash Punctuation 17456
Decimal Number 23004
  • listing_jobDesc contains many words: 29570 words
  • The largest value (data) is over 2.03 times larger than the second largest value (experience)

requested_url

categorical

Approximate Distinct Count 3990
Approximate Unique (%) 100.0%
Missing 0
Missing (%) 0.0%
Memory Size 1615287

Length

Mean 339.8338
Standard Deviation 251.9737
Median 257
Minimum 252
Maximum 1215

Sample

1st row https://www.glassd...
2nd row https://www.glassd...
3rd row https://www.glassd...
4th row https://www.glassd...
5th row https://www.glassd...

Letter

Count 764252
Lowercase Letter 578543
Space Separator 0
Uppercase Letter 185709
Dash Punctuation 26037
Decimal Number 424735
  • requested_url contains many words: 3990 words

role_type

categorical

Approximate Distinct Count 3
Approximate Unique (%) 0.1%
Missing 0
Missing (%) 0.0%
Memory Size 308936
  • The largest value (data_analyst) is over 3.33 times larger than the second largest value (data_engineer)

Length

Mean 12.4276
Standard Deviation 0.6833
Median 12
Minimum 12
Maximum 14

Sample

1st row data_analyst
2nd row data_analyst
3rd row data_analyst
4th row data_analyst
5th row data_analyst

Letter

Count 45596
Lowercase Letter 45596
Space Separator 0
Uppercase Letter 0
Dash Punctuation 0
Decimal Number 0
  • The top 2 categories (data_analyst, data_engineer) take over 50.0%
  • The largest value (data_analyst) is over 3.33 times larger than the second largest value (data_engineer)

location

categorical

Approximate Distinct Count 2
Approximate Unique (%) 0.1%
Missing 0
Missing (%) 0.0%
Memory Size 308256
  • The largest value (United States) is over 4.38 times larger than the second largest value (Singapore)

Length

Mean 12.2571
Standard Deviation 1.5557
Median 13
Minimum 9
Maximum 13

Sample

1st row Singapore
2nd row Singapore
3rd row Singapore
4th row Singapore
5th row Singapore

Letter

Count 45657
Lowercase Letter 38418
Space Separator 3249
Uppercase Letter 7239
Dash Punctuation 0
Decimal Number 0
  • The top 2 categories (United States, Singapore) take over 50.0%

Interactions

Correlations

Missing Values